AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
End-to-end audio processing

# End-to-end audio processing

Voila Autonomous Preview
MIT
Voila is a large family of speech-language foundation models designed to enhance human-computer interaction, supporting real-time, low-latency voice interaction and multilingual processing.
Text-to-Audio Transformers Supports Multiple Languages
V
maitrix-org
332
8
Voila Tokenizer
MIT
Voila is a large-scale voice-language foundation model series designed to enhance human-computer interaction, supporting multiple audio tasks and languages.
Text-to-Audio Transformers Supports Multiple Languages
V
maitrix-org
4,912
3
Ast Finetuned Speech Commands V2
Bsd-3-clause
An audio spectrogram transformer model fine-tuned on the Speech Commands v2 dataset for audio classification tasks, achieving 98.12% accuracy.
Audio Classification Transformers
A
MIT
10.94k
15
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase